AITopics | code generation model

071a637d41ea290ac4360818a8323f33-Supplemental-Conference.pdf

Neural Information Processing SystemsApr-24-2026, 10:32:15 GMT

artificial intelligence, machine learning, natural language, (20 more...)

Neural Information Processing Systems

Country: North America > United States (0.14)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.31)

Add feedback

Uncovering and Quantifying Social Biases in Code Generation

Neural Information Processing SystemsApr-24-2026, 10:32:11 GMT

With the popularity of automatic code generation tools, such as Copilot, the study of the potential hazards of these tools is gaining importance. In this work, we explore the social bias problem in pre-trained code generation models. We propose a new paradigm to construct code prompts and successfully uncover social biases in code generation models. To quantify the severity of social biases in generated code, we develop a dataset along with three metrics to evaluate the overall social bias and fine-grained unfairness across different demographics. Experimental results on three pre-trained code generation models (Codex, InCoder, and CodeGen) with varying sizes, reveal severe social biases. Moreover, we conduct analysis to provide useful insights for further choice of code generation models with low social bias1.

artificial intelligence, code generation model, machine learning, (19 more...)

Neural Information Processing Systems

Country: North America > United States (0.28)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Automatic Programming (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

EffiBench: Benchmarking the Efficiency of Automatically Generated Code

Neural Information Processing SystemsMar-18-2026, 09:57:12 GMT

Code generation models have increasingly become integral to aiding software development. Although current research has thoroughly examined the correctness of the code produced by code generation models, a vital aspect that plays a pivotal role in greencomputing and sustainability efforts -- the efficiency of the generated code -- has often been neglected. This paper presents Effibench, a benchmark with 1,000 efficiency-critical coding problems to assess the efficiency of code generated by code generation models. EffiBench contains a diverse set of LeetCode coding problems. Each problem is paired with an executable human-written canonical solution, which obtains the SOTA efficiency on the LeetCode solution leaderboard. With EffiBench, we empirically examine the ability of 42 large language models (35 open-source and 7 closed-source) to generate efficient code. Our evaluation results demonstrate that the efficiency of the code generated by LLMs is generally worse than the efficiency of human-written canonical solutions. For example, GPT-4 generated code has an average \textbf{3.12}

artificial intelligence, large language model, natural language, (10 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.99)

Add feedback

5762c579d09811b7639be2389b3d07be-Paper-Conference.pdf

Neural Information Processing SystemsFeb-9-2026, 03:00:24 GMT

code generation model, dataset, generation model, (12 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.73)

Add feedback

Appendix Uncovering and Quantifying Social Biases in Code Generation

Neural Information Processing SystemsFeb-7-2026, 12:35:16 GMT

We conduct a preliminary study on finding a proper prompt construction strategy. Further research can utilize our analysis to construct more powerful code prompts. Table 1: Code prompt study results of CBS. N" means there are one human-relevant function Table 2: Automatic and human evaluation results of social biases in the generated code on GPT -4. We also conduct experiments on GPT -4.

large language model, machine learning, natural language, (22 more...)

Neural Information Processing Systems

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
Asia > China > Hong Kong (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.71)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.46)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.46)
Information Technology > Artificial Intelligence > Representation & Reasoning > Automatic Programming (0.45)

Add feedback

Uncovering and Quantifying Social Biases in Code Generation Yan Liu Xiaokang Chen null Yan Gao

Neural Information Processing SystemsFeb-7-2026, 12:35:12 GMT

We propose a new paradigm to construct code prompts and successfully uncover social biases in code generation models.

code generation model, machine learning, natural language, (20 more...)

Neural Information Processing Systems

Country:

Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > Canada > Ontario > Toronto (0.04)
(2 more...)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)
Information Technology > Artificial Intelligence > Representation & Reasoning > Automatic Programming (0.67)

Add feedback

Uncovering and Quantifying Social Biases in Code Generation

Neural Information Processing SystemsDec-23-2025, 19:22:58 GMT

With the popularity of automatic code generation tools, such as Copilot, the study of the potential hazards of these tools is gaining importance. In this work, we explore the social bias problem in pre-trained code generation models. We propose a new paradigm to construct code prompts and successfully uncover social biases in code generation models. To quantify the severity of social biases in generated code, we develop a dataset along with three metrics to evaluate the overall social bias and fine-grained unfairness across different demographics. Experimental results on three pre-trained code generation models (Codex, InCoder, and CodeGen) with varying sizes, reveal severe social biases. Moreover, we conduct analysis to provide useful insights for further choice of code generation models with low social bias.

code generation model, generation model, uncovering and quantifying social bias, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)

Add feedback

15807b6e09d691fe5e96cdecde6d7b80-Paper-Datasets_and_Benchmarks_Track.pdf

Neural Information Processing SystemsOct-9-2025, 19:17:32 GMT

canonical solution, efficiency, gpt-3, (9 more...)

Neural Information Processing Systems

Country:

Europe > Austria > Vienna (0.14)
Africa > Rwanda > Kigali > Kigali (0.04)
North America > Canada > Ontario > Toronto (0.04)
(10 more...)

Genre: Research Report > New Finding (0.93)

Industry: Energy (0.92)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.99)

Add feedback

SCoder: Iterative Self-Distillation for Bootstrapping Small-Scale Data Synthesizers to Empower Code LLMs

Zhang, Xinyu, Zhou, Changzhi, Hu, Linmei, Zhang, Luhao, Chen, Xiancai, Fu, Haomin, Yang, Yang, Zhang, Mengdi

arXiv.org Artificial IntelligenceSep-10-2025

Existing code large language models (LLMs) often rely on large-scale instruction data distilled from proprietary LLMs for fine-tuning, which typically incurs high costs. In this paper, we explore the potential of small-scale open-source LLMs (e.g., 7B) as synthesizers for high-quality code instruction data construction. We first observe that the data synthesis capability of small-scale LLMs can be enhanced by training on a few superior data synthesis samples from proprietary LLMs. Building on this, we propose a novel iterative self-distillation approach to bootstrap small-scale LLMs, transforming them into powerful synthesizers that reduce reliance on proprietary LLMs and minimize costs. Concretely, in each iteration, to obtain diverse and high-quality self-distilled data, we design multi-checkpoint sampling and multi-aspect scoring strategies for initial data selection. Furthermore, to identify the most influential samples, we introduce a gradient-based influence estimation method for final data filtering. Based on the code instruction datasets from the small-scale synthesizers, we develop SCoder, a family of code generation models fine-tuned from DeepSeek-Coder. SCoder models achieve state-of-the-art code generation capabilities, demonstrating the effectiveness of our method.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2509.07858

Country: